x86/time: implement tsc as clocksource
Recent x86/time changes improved a lot of the monotonicity in xen
timekeeping, making it much harder to observe time going backwards.
Although platform timer can't be expected to be perfectly in sync with
TSC and so get_s_time won't be guaranteed to always return
monotonically increasing values across cpus. This is the case in some
of the boxes I am testing with, observing sometimes ~100 warps (of
very few nanoseconds each) after a few hours.
This patch introduces support for using TSC as platform time source
which is the highest resolution time and most performant to get.
Though there are also several problems associated with its usage, and
there isn't a complete (and architecturally defined) guarantee that
all machines will provide reliable and monotonic TSC in all cases (I
believe Intel to be the only that can guarantee that?). For this reason
it's not used unless administrator changes "clocksource" boot option
to "tsc". Initializing TSC clocksource requires all CPUs up to have
the tsc reliability checks performed. init_xen_time is called before
all CPUs are up, so for example we would start with HPET (or ACPI,
PIT) at boot time, and switch later to TSC. The switch then happens on
verify_tsc_reliability initcall that is invoked when all CPUs are up.
When attempting to initialize TSC we also check for time warps and if
it has invariant TSC. Note that while we deem reliable a CONSTANT_TSC
with no deep C-states, it might not always be the case, so we're
conservative and allow TSC to be used as platform timer only with
invariant TSC. Additionally we check if CPU Hotplug isn't meant to be
performed on the host which will either be when max vcpus and
num_present_cpu are the same. This is because a newly hotplugged CPU
may not satisfy the condition of having all TSCs synchronized - so
when having tsc clocksource being used we allow offlining CPUs but not
onlining any ones back. Finally we prevent TSC from being used as
clocksource on multiple sockets because it isn't guaranteed to be
invariant. Further relaxing of this last requirement is added in a
separate patch, such that we allow vendors with such guarantee to use
TSC as clocksource. In case any of these conditions is not met, we
keep the clocksource that was previously initialized on init_xen_time.
Since
b64438c7c ("x86/time: use correct (local) time stamp in
constant-TSC calibration fast path") updates to cpu time use local
stamps, which means platform timer is only used to seed the initial
cpu time. We further introduce a new rendezvous function
(nop_rendezvous) which doesn't require synchronization between master
and slave CPUS and just reads calibration_rendezvous struct and writes
it down the stime and stamp to the cpu_calibration struct to be used
later on. With clocksource=tsc there is no need to be in sync with
another clocksource, so we reseed the local/master stamps to be values
of TSC and update the platform time stamps accordingly. Time
calibration is set to 1sec after we switch to TSC, thus these stamps
are reseeded to also ensure monotonic returning values right after the
point we switch to TSC. This is to remove the possibility of having
inconsistent readings in this short period (i.e. until calibration
fires).
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>